Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico fixes #7184

Merged
merged 2 commits into from
Nov 19, 2024
Merged

calico fixes #7184

merged 2 commits into from
Nov 19, 2024

Conversation

disconn3ct
Copy link
Contributor

Add more missing configs for Calico. With this config it joins the mesh successfully and seems to be working normally.

@MichaIng MichaIng added this to the v9.7 milestone Aug 14, 2024
@MichaIng MichaIng modified the milestones: v9.7, v9.8 Aug 26, 2024
@StephanStS StephanStS modified the milestones: v9.8, v9.9 Oct 27, 2024
@disconn3ct
Copy link
Contributor Author

What else should I do to get this merged?

@MichaIng
Copy link
Owner

Sorry for the long time it took to review it. I added some comments above. Most seems fine. I guess you explicitly disabled some features, which are implicitly enabled with their parent configs? IMO this is not needed. If we enable a parent feature, it is IMO fine to have the kernel enable additional sub features of it, according to its defaults. It might be confusing, if we tailor it too much, a way that users see a kernel module or feature, but not fully functional the way they are used to on plain Debian or other distros.

@disconn3ct
Copy link
Contributor Author

Basically everything came from getting sudo k3s check-config to green.

It was a while ago and I don't remember what magic I used to get the config delta. I think it was menuconfig and the built-in config diff. (Looks like it was 6.10.11.)

@MichaIng
Copy link
Owner

Let me undo the questioning parts, and try with that kernel. If it really does not work, we can check again. But I cannot imagine that it requires features, which are not even available in the official Debian kernel, and that it requires some otherwise enabled features to be explicitly disabled. Not sure how this config check tool works, but before we are doing weird stuff, we should at least fully understand why/what for it thinks it requires this.

- Quartz64 | Remove features which are not available on Debian and RPi kernel either, and do not disable features explicitly, which are not disabled in Debian or RPi kernel explicitly as well. For sub features, we better follow Linux defaults, and how it is on other distros and images, to not cause confusion with a disabled sub feature, which is commonly enabled and expected. And we do not want to have an overly extended kernel. If a feature is not even enabled in the rich default Debian and/or RPi kernel, then it should be better understood, what for it is really needed.
@MichaIng
Copy link
Owner

The commit history is transparent here, so we can recover any of this, if needed. But let's go with this for now, basically matching the well known and much used Debian and RPi kernel builds. I'll create an image from this ASAP.

@MichaIng MichaIng merged commit b2607b2 into MichaIng:dev Nov 19, 2024
1 check passed
@MichaIng
Copy link
Owner

@disconn3ct can you test with this kernel build: https://dietpi.com/downloads/binaries/testing/
Or with the respective image here: https://dietpi.com/downloads/images/testing/

@disconn3ct
Copy link
Contributor Author

Crashloop.

calico-node-m6xg4 calico-node 2024-11-22 13:02:59.842 [WARNING][76] felix/int_dataplane.go 2162: Failed to synchronize routing table, will retry...
calico-node-m6xg4 calico-node 2024-11-22 13:02:59.943 [INFO][76] felix/wireguard.go 1704: Trying to connect to linkClient ipVersion=0x4
calico-node-m6xg4 calico-node 2024-11-22 13:02:59.944 [INFO][76] felix/route_rule.go 189: Trying to connect to netlink
calico-node-m6xg4 calico-node 2024-11-22 13:02:59.946 [ERROR][76] felix/route_rule.go 248: Failed to list routing rules, retrying... error=operation not supported ipVersion=4
calico-node-m6xg4 calico-node 2024-11-22 13:02:59.947 [WARNING][76] felix/int_dataplane.go 2162: Failed to synchronize routing table, will retry...
calico-node-m6xg4 calico-node 2024-11-22 13:03:00.051 [INFO][76] felix/wireguard.go 1704: Trying to connect to linkClient ipVersion=0x4
calico-node-m6xg4 calico-node 2024-11-22 13:03:00.052 [INFO][76] felix/route_rule.go 189: Trying to connect to netlink
calico-node-m6xg4 calico-node 2024-11-22 13:03:00.054 [ERROR][76] felix/route_rule.go 248: Failed to list routing rules, retrying... error=operation not supported ipVersion=4

And k3s check-config (trimmed):

Optional Features:
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_RT_GROUP_SCHED: missing
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing

@MichaIng
Copy link
Owner

MichaIng commented Nov 22, 2024

But the features listed as "missing" are all also listed as "optional", and do not seem at all related to listing routing rules. Can you point me to the source code of this felix/route_rule.go script to check what exactly it tries to do?

Found it: https://github.com/projectcalico/calico/blob/master/felix/routerule/route_rule.go#L244
So about listing netlink routing tables, which does not seem to be related to these 3 options. I'll check.

EDIT:

root@SOQuartz:~# ip rule
RTNETLINK answers: Operation not supported
Dump terminated

Yeah, that is an issue.

I think I found them:

CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_MULTIPLE_TABLES=y

needed for any further routing capabilities, also when providing a hotspot or AP etc. Somewhat essential, also enabled in Debian and RPi kernel. I'll rebuild the kernel with these.

@MichaIng
Copy link
Owner

So that works now:

root@SOQuartz:~# ip rule
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

@disconn3ct can you test again the new kernel build (same directory)? I did not rebuild images, but can do so, if it makes things easier for you.

@disconn3ct
Copy link
Contributor Author

all also listed as "optional"

Right, optional because some configurations need them and some don't. Calico needs them.

new kernel build

The new build is working. 👍 I haven't fully exercised it but it seems pretty functional so far. Calico is happy.

@MichaIng
Copy link
Owner

Right, optional because some configurations need them and some don't. Calico needs them.

Obviously it is optional for Calico as well, as the new kernel works. Network device throttling, real-time/low-latency task scheduling and IPsec is obviously not by in our case. Interesting that it did not check those essential routing table features, but I am glad we found it.

@disconn3ct
Copy link
Contributor Author

I meant to get back to this earlier, but just fyi dietpi on an rpi4 has BLK_DEV_THROTTLING on. (The only other difference is that hugeTLB is off on the rpi and on for soquartz64.)

@disconn3ct disconn3ct deleted the fix/calico branch December 17, 2024 15:18
@MichaIng
Copy link
Owner

On my RPi 2 it says this:

root@micha:~# zgrep BLK_DEV_THROTTLING /proc/config.gz
# CONFIG_BLK_DEV_THROTTLING is not set

But RPi 4 and 5 indeed have it enabled: https://github.com/raspberrypi/linux/tree/rpi-6.6.y/arch/arm64/configs

It however does not matter. The only reason it did not work was the missing routing table support, everything else is optional only, and block device throttling is not at all related to network stuff, but only something K3s might optionally use for container setups if available, in combination with related cgroups. We'll keep it disabled until someone really needs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants